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The nucleotide sequence of the unique region of coronavirus MHV-A59 mRNA 2 has been determined. Two open 
reading frames (ORF) are predicted: ORF1 potentially encodes a protein of 261 amino acids; its amino acid sequence 
contains elements which indicate nucleotide binding properties. ORF2 predicts a 413 amino acids protein; it lacks a 
translation initiation codon and is therefore probably a pseudogene. The amino acid sequence of ORF2 shares 30% 
homology with the HA1 hemagglutinin sequence of influenza C virus. A short stretch of nucleotides immediately up- 
stream of ORF2 shares 83% homology with the MHC class | nucleotide sequences. We discuss the possibility that both 
similarities are the result of recombinations and present a model for the acquisition and the subsequent inactivation of 
ORF2; the model applies also to MHV-A59-related coronaviruses in which we expect ORF2 to be still functional. 
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INTRODUCTION 


Murine hepatitis virus (MHV) is the most widely stud- 
ied member of the Coronaviridae. This family of envel- 
oped, single-stranded RNA viruses causes consider- 
able economic loss, since coronavirus infections can 
severely affect cattle, poultry, and pets. Human coro- 
navirus OC43 causes the common cold in man. Murine 
coronaviruses are of particular interest because sev- 
eral strains can cause a (chronic) demyelinating dis- 
ease in rats and mice. For this reason the pathogenesis 
of MHV infections is studied as an animal model for 
virus-induced demyelination (Wege et a/., 1982). MHV- 
Ab59 virions contain an infectious RNA genome, about 
30 kb in length, associated with a nucleocapsid protein 
(N). Two membrane proteins have been identified: the 
transmembrane glycoprotein E1 and the large surface 
glycoprotein E2 (Siddell et a/., 1982). The MHV-A59 ge- 
nome is composed of seven different regions (A to G), 
separated by short, very similar junction sequences 
(Bredenbeek et a/., 1987). The messenger RNAs that 
are synthesized during infection are 3’-coterminal, and 
each extends to a different junction sequence in the 5’- 
direction. This results in a nested set of mRNAs, includ- 
ing the genome, in which each has a different ‘‘unique”’ 
region at its 5’-end (Leibowitz et a/., 1981; Lai et a/., 
1981; Spaan et a/., 1982). All mRNAs share a leader 
sequence of about 72 nucleotides (Spaan et a/., 1983; 
Lai et a/., 1984). /n vitro translated MHV mRNAs encod- 
ing the structural proteins N, E1, and E2 and the 14.5K 
nonstructural protein are functionally monocistronic 


‘To whom requests for reprints should be addressed. 


415 


(Rottier et a/., 1981; Siddell, 1983), and sequence anal- 
yses have shown that the coding regions are located 
at the 5’-end of these individual MRNAs (Siddell, 1987). 
There is one possible exception: sequence analysis of 
the 5’-end of MRNA 5 (region E) revealed two open 
reading frames (Skinner et a/., 1985; Budzilowicz and 
Weiss, 1987). Whether both reading frames are used 
is not known. 

The coronaviruses studied to date show an identical 
order of the genes encoding the structural proteins: 5’- 
E2-E1-N-3’ (De Groot et a/., 1987). Between coronavi- 
ruses these genes are highly homologous. In contrast, 
differences are found in the structure and number of 
the genes encoding the nonstructural proteins, which 
is reflected in the number of subgenomic mRNAs that 
is synthesized by each coronavirus. In infectious bron- 
chitis virus (IBV), feline infectious peritonitis virus 
(FIPV), and its close relative transmissible gastroenteri- 
tis virus (TGEV), members of different antigenic clus- 
ters from MHV, the largest subgenomic mRNA en- 
codes the peplomer protein E2 or S (Binns et a/., 1985: 
Niesters et a/., 1986; De Groot et a/., 1987; Rasschaert 
and Laude, 1987; Jacobs et a/., 1987). In contrast, in 
MHV-infected cells an additional, larger RNA (mRNA 
2) has been identified (Spaan et a/., 1981; Weiss and 
Leibowitz, 1983). /n vitro translation of this mRNA 
yields a 30K-—35K protein (Leibowitz et a/., 1982; Sid- 
dell, 1983). In MHV-JHM-infected cells, small amounts 
of a 30K protein can be detected (Siddell et a/., 1981). 
However, the size of the unique region of MRNA 2, ap- 
proximately 2 kb, indicates a larger coding capacity. 

In order to study the function of MRNA 2 we have 
cloned and sequenced region B of MHV-A59. Here we 
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present its primary structure and show that it contains 
two open reading frames (ORF). The predicted amino 
acid sequence of the second ORF is remarkably similar 
to the HA1 sequence of the hemagglutinin protein of 
influenza C virus. We discuss the possibility that this 
ORF has been acquired by a recombination event. 


MATERIALS AND METHODS 
cDNA synthesis and cloning 


A MHV-A59-specific cDNA library was created using 
random primers on purified genomic RNA. Procedures 
were identical to those described previously (Luytjes et 
al., 1987). Full details will be presented elsewhere 
(P. J. Bredenbeek et a/., manuscript in preparation). 


Selection and analysis of cDNA clones 


Recombinant cDNA clones were selected by hybrid- 
ization (Meinkoth and Wahl, 1984) to oligonucleotide 
probes specific for the viral mRNAs (P. J. Bredenbeek 
et a/., manuscript in preparation). Plasmid DNA from 
recombinant clones was prepared according to Birn- 
boim and Doly (1979). Inserts were subcloned into 
M13 vectors (Messing, 1983). Selection of M13 sub- 
clones specific for the unique region of mRNA 2 was 
performed by hybridizing phage supernatant to pen- 
tamer primed probes (Feinburg and Vogelstein, 1983; 
Roberts and Wilson, 1985) from previously oligo- 
nucleotide-selected cDNA clones. 


DNA sequence analysis 


Sequence analysis was essentially done according 
to Sanger et a/. (1977). Computer assembly of se- 
quence data was performed using the Staden program 
set (1986). 


Similarity search of protein sequences 


The predicted amino acid sequences were com- 
pared to the National Biomedical Research Foundation 
(NBRF) Protein Library (release 11) using the FASTP 
program set created by Lipman and Pearson (1985). 
Additional analysis of similarities was carried out with 
the DIAGON program of Staden (1982). 


RESULTS 
Isolation of region B specific cDNA clones 


We have recently constructed an almost complete 
random-primed cDNA library of the MHV-A59 genome. 
Aset of oligonucleotides was synthesized, based upon 
the sequence of previously obtained MHV-A59-spe- 
cific CDNA clones which had been mapped on the viral 
mRNAs (P. J. Bredenbeek, manuscript in preparation). 


Oligonucleotides OL 4 (specific for mRNA 1), OL 6 
(mRNA 2), and OL 7 (mRNA 3, see Luytjes et a/., 1987) 
were used to screen the cDNA library for clones cover- 
ing region B. Two completely overlapping clones (30, 
96) and several clones with partial overlaps (4D, 35, 
F71, 95, 918) were isolated. Clone 96 was digested 
with Sau3A and subsequently ligated into the BamH1 
site of M13mp9. The other selected cDNA clones were 
subcloned using restriction enzymes as indicated in 
Fig. 1. Each nucleotide of region B was determined on 
at least two different cDNA clones and selected re- 
gions on three or more cDNA clones. 


Identification of the unique region of MRNA 2 


The 3’-end of region B has already been identified 
at the junction sequence 5-UAAUCUAAAC-3’', which 
separates it from the peplomer coding sequence 
(Luytjes et a/., 1987). The only other potential junction 
sequence within the consensus sequence of the re- 
gion B-specific cDNA clones was found at position 
—9589 (Fig. 1) from the start of the poly(A)-tail of the 
genome: 5'-AAAUCUAUAC-3’ (Fig. 2). Immediately up- 
stream of this sequence an ORF terminates, the pri- 
mary structure of which shows a high similarity to the 
3'-terminal sequence of the unique region of IBV mRNA 
F (Boursnell et a/., 1987, and data not shown). This 
strongly suggests that the junction sequence at posi- 
tion —9589 corresponds to the 5’-end of the unique re- 
gion of MRNA 2. 


Nucleotide and amino acid sequence 


The consensus nucleotide sequence of region B is 
2176 residues long (Fig. 2). It contains two open read- 
ing frames. The first open reading frame (ORF 1) starts 
18 nucleotides downstream from the junction se- 
quence and is 261 amino acids (aa) long. The second 
ORF (ORF 2) starts 903 nucleotides downstream and is 
413 aa long. It terminates 23 nucleotides upstream 
from the junction sequence that separates regions B 
and C (ihe peplomer gene). Between ORF1 and ORF2 
lies a stretch of 92 nucleotides with several termination 
codons in each reading frame (see Fig. 2). 


Analysis of ORF 1 


In ORF1 three potential translation initiation codons 
can be found. The first AUG is in a strong context (Ko- 
zak, 1986) and is therefore most probably used. The 
coding capacity of ORF1 is 30K, which is in agreement 
with the products obtained after in vitro translation of 
mRNA 2. There are no membrane protein sequence 
characteristics, such as a signal sequence, a trans- 
membrane anchor sequence, or potential N-glycosyla- 
tion sites. Diagon comparison (Staden, 1982) of the 
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Fic. 1. Cloning and sequencing strategy of the MHV-A59 region B. The upper line represents the MHV genome. Symbois indicate the restric- 
tion enzyme recognition sites (specified in the figure) used in subcloning. Vertical bars and the negative numbers above mark the starts of the 
junction sequences and the distances to the start of the poly(A)-tail of the genome. The arrow points to the position of oligonucleotide 6 (OL 6). 
Open boxes represent open reading frames. Pol, polymerase; E2, peplomer protein; ORF1 and ORF2 are region B open reading frames. Num- 
bered bars refer to cDNA clones; direction and extent of sequencing of subclones is indicated by the arrows below. 


ORF1 amino acid sequence with available sequences 
of other coronaviruses did not reveal any similarities. A 
FASTP similarity search (Lipman and Pearson, 1985) 
of the NBRF protein library produced an alignment to 
several proteins with nucleotide binding properties 
(data not shown). Recently, consensus sequence ele- 
ments have been published, for which an involvement 
in nucleotide binding is proposed (Dever et a/., 1987; 
Fry et a/., 1986). Three regions in the ORF1 sequence 
match to these elements (Fig. 3). 


Analysis of ORF2 


ORF2 does not start with an AUG codon; the first 
potential initiation codon within ORF2 is found at posi- 
tion 110. Interestingly, in the region upstream of ORF2 
an AUG codon (position 879) is found in a favorable 
context, which precedes a short reading frame, sepa- 
rated from ORF2 by only one opal termination codon 
(Fig. 2). This short reading frame is 90% homologous 
(83% at the nucleotide level) to the N-terminus of the 
signal sequence of several MHC class | genes (Fig. 4; 
Schepart et a/., 1986). There is no other significant sim- 
ilarity between class | sequences and any MHV se- 
quence. The region overlapping the end of ORF1 and 
the beginning of ORF2 has been sequenced on three 
independent cDNA clones. The sequences are identi- 
cal, excluding the possibility that the presence of the 
termination codon is a cloning or sequencing artifact. 

The sequence of ORF2 shows characteristics of a 
membrane protein sequence: the C-terminal hydro- 


phobic residues (underlined in Fig. 2) could provide a 
membrane anchor and 10 potential N-glycosylation 
sites are present. 

The most remarkable aspect of the ORF2 sequence 
came from FASTP analysis of the NBRF protein library: 
the predicted amino acid sequence encoded by ORF2 
shows a 30% homology with the HA1 sequence of the 
hemagglutinin protein of influenza C virus (Nakada et 
al., 1984; Pfeiffer and Compans, 1984). The alignment 
presented in Fig. 5 shows that several regions are com- 
pletely identical and that many conservative substitu- 
tions (Dayhoff et a/., 1983) are present. 

We could not detect similarities between the pre- 
dicted ORF2 amino acid sequence and other influenza 
C (or A or B) virus sequences, nor was there any similar- 
ity to available coronavirus sequences. 


DISCUSSION 


In this paper we present the primary structure of the 
unique region of MHV-A59 mRNA 2. Sequence analy- 
sis revealed two ORFs. ORF1 has a coding capacity of 
30K. /n vitro translation of mRNA 2 of MHV-JHM (Sid- 
dell, 1983) and MHV-A59 (Leibowitz et a/., 1982) 
yielded a 30K protein. Also in MHV-JHM-infected cells 
small amounts of a 30K protein have been detected 
(Siddell et a/., 1981). This suggests that this protein is 
encoded by ORF1 from mRNA 2. We assume that the 
ORF1 translation product is initiated at the 5’-proximal 
AUG since this codon is in a preferred context (Kozak, 
1986). The presence of three consensus elements in 
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== MAP ADK PNHFINFPLAQFSGFMGKYLKLQSQ 31 
[Ra RTCEATAR} NCTCSTGGCTCTGAAAATOGOCETIGCTGACANGOCTAATCATTTCATAAACTTTOCOCTGGOOC AAT? TAGIGOCTTTATOOGT ANGTATITAAGCTACAGTCICAA 120 
> 60 


LV EMGtLODCKLQKAPHVYS ITLUDIK AD QY K QVYVEFATIQETIIOD 71 
CTTGTGGAAATGGGTTTAGACTGT AAATT ACAGAAGGCACCACATGTTAGTATTACCCTGCTTGATATTAAAGCAGACCAAT ACAAACAGGTGGAATTTGCAATACAAGAAATAATAGAT =. 240 
180 


DLA A YEGoODtIV F DN PHM LGRCLVYLODVYRG*FEELH EDIVEILR 111 
GATCTGGCGGCATATGAGGGAGATATTGTCTTTGACAACCCTCACATGCTTGGCAGATGCCTTGTTCTTGATGTTAGAGGATTIGAAGAGTTGCATGAAGATATTGTTGAAATTCTCCGC = 360 
300 


RR GcTaAoODQS RH W IPH CTV AQF DEER ET K GM QF YH K EP FY L 151 
AGAAGGGGTTGCACGGCAGATCAATCCAGACACTGGATTCCGCACTGCACTGTGGCCCAATTTGACGAAGAAAGAGAAACAAAAGGAATGCAATTCTATCATAAAGAACCCTTCTACCTC = 480 
420 


K HN NLLTODAGULEUVkIGss KIDGFYCS ELS VWCGERLCYK 191 
AAGCATAACAACCTATTAACGGATGCTGGGCTTGAGCTCGTGAAGATAGGTTCTTCCAAAATAGATGGGTTTTATTGTAGTGAACTGAGTGTTTGGTGTGGTGAGAGGCTTTGTTATAAG 600 
540 


PPTPKFSDIFGYCCIDKIRGDLEITLGDLPQDDEEAWAELSY 231 
CCTCCAACACCCAAATTCAGTGATATATTIGGCTATTGCTGCATAGATAAAATACGTGGTGATTTAGAAATAGGAGACCT ACCGCAGGATGATGAGGAAGCGTGGGCCGAGCTAAGTTAC = 720 
660 


* ON OE GLY OV ey bo oR eT TL Seve TE 
*RVVCYDLFULH YY * CN K 
HY QR N T Y F F RH VHODN S TY FRTVCRM K GEM C* FV FTLULY * * 261 
CACTATCAAAGAAACACCTACTTCTTCAGACATGTGCACGATAATAGCATCTATTTTCGTACCGTGTGTAGAATGAAGGGTIGTATGTGTTGATTIGITTTTACACTATTAGTGTAATAA 840 
780 


SLLFc*®* KG QDVHS ¥G SS HTAFADLMSAGYVYV WV Q * 
LIdmrIoeuutLkKRAGCA* LWULELAHCFCO* FDVS WCLGSMNULULTS FHI 


A Y Y F V EK G@G RM CTITAM AP RTLLELLEI* € Q@QLvYF GFNEPLNIYV SH 16 
GCTTATTATTTTGITGAAAAGGGCAGGATGTGCATAGCTATGGCTCCTCGCACACTGCTTTTGCTGATITGATGTCAGCTGGTGTTTGGGTTCAATGAACCTCITAACATCGTTTCACAT 960 
900 ->0O 


* 


LNDODW FULFGoODSRSDCTYVENNGHPKLODWLODLDPKLCNS GK 56 
TTAAATGATGACTGGTTTCTATTTGGTGACAGTCGTTCTGACTGTACCTATGTAGAAAAT AACGGTCATCCT AAATTAGATTGGCTTGACCTCGACCCAAAGTTGTGTAATTCAGGAAAG 1080 
1020 


Is aK S$ GN S$ LFRS FH FTODFYNYTGEGODQtIYF YEGVNFS PSH 96 
ATTTCCGCAAAGAGTGGTAACTCTCTCTTT AGGAGTTTTCACTTCACTGATTTT TACAATTATACGGGTGAGGGAGACCAAATTGTATTTTATGAAGGAGTTAATTTTAGTCCCAGCCAT 1200 
1140 


G F K C LAH G ODN K R WM GN K AR FY ARV Y EK MAQYRSULS FVNVS 136 
GGCTTT AAATGCCTGGCTCATGGAGATAAT AAAAGATGGATGGGCAAT AAAGCTCGATTTTATGCCCGAGTGTATGAGAAGATGGCCCAATATAGGAGCCTATCGTITGTTAATGTGTCT 1320 
1260 


YaYCOGNAKPAS ICKODNTLTUNNPTFISKESNYVDYYYESE 176 
TATGCCTATGGAGGTAATGCAAAGCCCGCCTCCATTTGCAAAGACAATACTTTAACACTCAATAACCCCACCTTCATATCGAAGGAGTCTAATTATGTTGATTATTACTATGAGAGTGAG 1440 
1380 


AN F TL EGcCoOEF IVPLCGFNGHS KGS SS DAANKYYTODSQS Y 216 
GCTAATTTCACACTAGAAGGTTGTGATGAATTTATAGTACCGCTCTGTGGTITT AATGGCCATTCCAAGGGCAGCTCTTCGGATGCTGCCAATAAATATTATACTGACTCTCAGAGTTAC 1560 
1500 


YN M BD tC VL ¥ @ FON SS TL DY GON TA © OD P CL DP OL Ee oR FOL A LOTR CON FE 256 
TATAATATGGATATTGGTGTCTTATATGGGTTCAATTCGACCTTGGATGTTGGCAACACTGCTAAGGATCCGGGTCTTGATCTCACTTGCAGGTATCTTGCATTGACTCCTGGTAATTAT 1680 
1620 


RAV S LE YL LS bPS KA TCL 8 OR T RR F MPV OV VY DS RWS S © RR Q<S 296 
AAGGCTGTGTCCTT AGAATATTTGTT AAGCTT ACCCTCAAAGGCTATTTGCCTCCATAAGACAAAGCGCTTT ATGCCTGTGCAGGT AGTTGACTCAAGGTGGAGTAGCATCCGCCAGTCA 1800 
1740 


DN M TFT A AAC QL P Y¥ C F F RN TS AN Y¥ S GGT H DAH H GODF HF RQULE 336 
GACAATATGACCGCTGCAGCCTGTCAGCTGCCATATTGTTTCTTTCGCAACACATCTGCGAATTATAGTGGTGGCACACATGATGCGCACCATGGTGATTTICATTTCAGGCAGTTATTG 1920 


1860 
$ GG LLYN VS € TAQQGA FL Y NN ¥Y S$ SS WPA Y GY GH CP TAAN IG 376 
TCTGGTTTGTTATATAATGTTTCCTGTATTGCCCAGCAGGGTGCATTTCT ITATAATAATGTTAGTTCCTCTTGGCCAGCCTATGGGT ACGGTCATTGTCCAACGGCAGCTAACATTGGT 2040 
1980 
YM ACF VOC TY DOP GOP VT a GV ie LG Tt AY OL Ty Or VEE ONY Ob EY DG 413 
TATATGGCACCTGTITGT ATCT ATGACCCTICTCCCGGTCATACTGCTAGGTGTGTTATTGGGTATAGCTGTGTTGATTATTGTGTTITTGAATGTTTTATTTTATGACGGATAGCGGTGT 2160 
2100 


TAGATTGCATGAGGCATAATCTAAAC 2186 


Fic. 2. Nucleotide sequence of the MHV-A59 region B and predicted amino acid sequence of the open reading frames ORF1 and ORF2. 
Junction sequences (see text) are boxed. The start of the open reading frames is indicated by the arrows below the sequence. The region 
between ORF1 and ORF2 is translated in three reading frames. The hydrophobic C-terminus of ORF2 is underlined. ORF1 is numbered 1 (M)- 
261 (C), ORF2 is numbered 1 (C)- 413 (G). Nucleotide numbering starts at relative position —9589 from the start of the poly(A)-tail. Single letter 
amino acid code is used. 
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/+D~X-X-G}-/-40-80-/-NtK-X-D 
. D-V-R-G} /—-81—~/-S{K-I-D 


Fic. 3. Alignment of ORF1 of MHV-A59 region B to sequence elements which are proposed to be involved in nucleotide binding. ATP, 
sequence elements involved in ATP binding. GTP, sequence elements involved in GTP binding. ORF1, first open reading frame of region B of 
MHV-A59. The numbers represent the distances between the elements (the first number is the distance to the start of the sequences). X, any 
amino acid; B, hydrophobic amino acids L, V, F, Y, |. Data are taken from Fry et a/. (1986) and Dever et a/. (1987). 


the sequence of ORF1 with possible nucleotide bind- 
ing and phosphorylating properties (Dever et a/., 1987; 
Fry et a/., 1986) suggests a role for its product in virus 
replication or phosphorylation of the nucleocapsid pro- 
tein (Siddell et a/., 1982). Experiments are in progress 
to establish whether the ORF1 product is essential for 
MHV, in view of the fact that a mRNA 2 is absent in 
cells infected with coronaviruses from other antigenic 
clusters. 

Unexpected was the presence of a second open 
reading frame, ORF2, located between ORF1 and the 
peplomer gene, without a translation initiation codon, 
showing a remarkable amino acid similarity to the HA1 
sequence of influenza C virus. The percentage of iden- 
tity is high enough to rule out convergent evolution 
(Dayhoff et a/., 1983; Doolittle, 1981). We believe that 
this similarity is the result of a recombination between 
coronaviruses and influenza C virus. Recent studies 
have indicated that coronaviruses are indeed capable 
of recombination. Makino et a/. (1986) described ho- 
mologous recombination between coronaviruses in 
mixed infections; the stretch of 267 nucleotides that 
we have found in the MHV-A59 peplomer gene and 
that is absent in MHV-JHM (Luytjes et a/., 1987) could 
indicate a nonhomologous recombination. 

In MHV-A59-infected cells a protein that can be 
assigned to ORF2 has never been detected (Siddell 
et al., 1982). Since nonfunctional reading frames of 
RNA viruses show a high rate of mutation (Holland et 
al., 1982), ORF2 must be either functional or the result 
of recent genetic changes. In the first case, possible 
ways of translating ORF2 would be either internal initia- 
tion at AUG codons in suboptimal contexts (which is 
unlikely) or protein initiation at an upstream AUG codon 
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at position —33 from the start of ORF2 and read- 
through of the opal termination codon at position ~3. 
Opal suppression has been reported for RNA viruses 
(Strauss et a/., 1983; Morch et a/., 1987) and can be an 
important feature of the viral translation strategy. Inter- 
nal initiation combined with read-through of an opal ter- 
mination codon would probably lead to undetectable 
amounts of protein in infected cells. The number and 
location of termination codons in the region between 
ORF1 and ORF2 excludes the possibility of frame 
shifting. 

In the second case ORF2 could have been acquired 
recently by recombination between MHV and influenza 
C virus. However, there is considerable evolutionary 
distance between both viruses: the nucleotide se- 
quences of ORF2 and the HA1 gene are not similar and 
the codon usage in both reading frames is different 
(data not shown). Therefore, recombination must have 
taken place between ancestors of these viruses. This 
means that closely related coronaviruses should exist 
in which ORF2 is still expressed and that ORF2 in MHV- 
A59 must have been recently inactivated by genetic 
changes. An ORF2 product would range in size from 
45K (unglycosylated) to 65K (N-glycosylated) and sev- 
eral coronaviruses containing additional proteins in this 
range have been reported. MHV-JHM, which shares at 
least 87% homology with MHV-A5Q in the nucleotide 
sequences from the peplomer gene down to the 
poly(A)-tail (Luytjes et a/., 1987), encodes one addi- 
tional glycoprotein: gp 65 (Siddell, 1982). Sequence 
data indicate that the corresponding gene must be lo- 
cated upstream of the peplomer protein gene. Taguchi 
et al. (1985, 1986) described a JHM variant (CNS) 
which shows a high expression level of a 65K protein 
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Fic. 4. Alignment of the MHV-A59 sequence around the start of ORF2 from region B and a MHC class | mRNA: H2-D°. The MHV-A59 sequence 
is numbered according to Fig. 2. The H2 sequence is taken from Schepart et a/. (1986). The H2-D° amino acid sequence depicted represents 
the signal sequence. Identical nucleotides are marked with lines and identical amino acids are boxed. 
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HA1 rise ine zi FGDS RID} --QSINSA erie RSLUSCKST. MFGPPGRVD KHKVF 8125 
ORF 2. COLVIRGENEALNT VSHLNID- DF LEGDSRISDCTY V ENN GHPK LDWLDLDPRILCNSGKITSAKISGNSUPRSFHETD---F Dativ 86 
YEGV NWS PHA AT DY ----RIKIN aie HCMGILIVNAL DKTIPLOVTKGVARINCNINS RUKINPALYTQEVKPLEQICGE 211 
YEGVNIFIS BS RGF RICH. ANG DNIKIRIWMG NEVA RIEVY AR VI FIK MAWQY R STISIV NIV SY AYGGNAKPASIQK-DMITL FISKISNYVDYYY-~ 173 
i Oh SRaaaa nih aera ea LDNRVSPYTGNSGDTPTM ied: 299 
S elalMPTUEG----~ DEF tviPriqclNGiiskiIGSSSDAANK YT DSS YYINMD TGILYGIPNSTLDVGNTAKDPGLDLTQRYIUAILINPGMYK = 257 
VR aie RSY(QFD ne ‘ea K Cee Denil eae T oi ime Et AR 386 
AVISILEY US UPSK ATT H Wn FMP vio V SS TIRGISDNMTA AAG-orlPvide FRNTS ANISIcTH (i HAR uyNVsi © 345 
HA2 <> 
Ts hi rindnune Aacdkiain idkcidicn ute IVEAGIGGYLLGSRKE 476 
TIAIOGA FLYNN VSSSWPAYG--YGHCBTAAINTGY--MAHVCTYNALPVILLGV-~--~--LUGqALTOVAUNYLFYDG* 413 
Fia. 5. Alignment of the MHV-A59 ORF2 sequence from region B and the influenza C hemagglutinin HA1 sequence and part of the HA2 


sequence. Identical residues are boxed, substitutions scoring 0 or positive according to Dayhoff et a/. (1983) are indicated by colons. Dashes 
represent gaps which were inserted to maximize similarity. The sequence was taken from Nakada et a/. (1986). 


and an additional MRNA 2a, intermediate in size be- 
tween mRNA 2 and mRNA 3. Bovine coronavirus (BCV) 
shows a strong similarity to MHV-A59 in the nucleo- 
capsid and matrix protein sequences (Lapps ef a/., 
1987) and it contains an additional spike protein E3, a 
hemagglutinin (King et a/., 1985; Deregt et a/., 1987). 
The size of the hemagglutinin monomer is 65K and 
BCV also encodes a mRNA 2a (Keck et a/., 1987). The 
data on these coronaviruses lead us to suggest that 
ORF2 in MHV-A59 corresponds to the reading frames 
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Fig. 6. Schematic representation of the recombination and muta- 
tion events that could have led to the situation in MHV-A59 region B. 
Drawn lines indicate recombination between the regions marked by 
dark horizontal bars. POL, polymerase; ORF1 and ORF2 are the read- 
ing frames of MHV-A59 region B; E2, peplomer protein gene; HA, 
hemagglutinin; E3, putative membrane protein (gp 65 of MHV-JHM, 
hemagglutinin of BCV and OC43); MHC, class | MHC mRNA. 


encoding gp 65 in JHM and the 65K hemagglutinin E3 
in BCV and that these genes are located on a separate 
MRNA 2a in the JHM CNS variant and in BCV. Junction 
sequences are involved in the initiation of coronavirus 
mRNAs. The apparent absence of a junction sequence 
upstream of ORF2 in MHV-A59 explains the absence 
of a MRNA 2a in infected cells (Spaan et a/., 1981; 
Weiss and Leibowitz, 1983). This could have been the 
result of an accumulation of recent point mutations. 
However, the strong similarity at both the amino acid 
and the nucleotide levels between the region immedi- 
ately upstream of the opal termination codon (in front 
of ORF2) and the 5’-end of the coding region of several 
MHC class | mRNAs indicates that the initiation codon 
of ORF2 and the junction sequence upstream were lost 
because of a recent nonhomologous recombination 
event with MHC mRNA. 

The suggested homology between ORF2 of MHV- 
A59 and the BCV E3 gene leads us to propose a model 
for the relation between several coronaviruses in the 
antigenic cluster of MHV. Human coronavirus O0C43 
is closely related to BCV (Lapps and Brian, 1985) and 
shows sequence similarity to MHV-A59 (Hogue et a/., 
1984: Weiss, 1983). Since OC43 and influenza C virus 
cause a similar infection in humans (McIntosh et a/., 
1969; Katagiri et a/., 1983) OC 43 could have acquired 
its hemagglutinin gene in a mixed infection. More likely, 
coinfection of another coronavirus with influenza C vi- 
rus followed by recombination gave rise to the new co- 
ronavirus OC43. The hemagglutinin gene of OC43 and 
BCV would then be the evolutionary intermediate be- 
tween influenza C virus HA and MHV ORF2 (see Fig. 
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6). This model is supported by recent experiments per- 
formed in cooperation with Drs. R. Vlasak and P. 
Palese (Viasak et a/., 1988) which show that BCV and 
OC43 recognize the same receptor and possess the 
same esterase activity as has been reported for the in- 
fluenza C virus hemagglutinin protein (Vlasak et a/., 
1987). 

It has been suggested that virus evolution is a modu- 
lar event, in which viral genomes are the result of the 
assembly of a set of primitive genes (see Goldbach, 
1987). This mechanism can offer an alternative expla- 
nation for the relation between MHV and influenza C 
virus. However, the similarity with MHC RNA and the 
previously reported extra stretch of nucleotides in the 
A59 peplomer gene (Luytjes et a/., 1987) indicate that 
coronaviruses are probably capable of nonhomolo- 
gous recombination during replication. To date nonho- 
mologous recombination at the RNA level in animal 
RNA viruses has been reported only for defective inter- 
fering RNA (see King et a/., 1987). Coronaviruses are 
the first example of nontumor RNA viruses being able 
to take up directly into their genome genetic material 
from the host cell. This may be a strong force in gener- 
ating strains with new host spectra and tissue tropisms 
and could have important implications for the preven- 
tion of coronavirus infections. 
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